Scalability of Learning Arbiter and Combiner Trees from Partitioned Data
نویسندگان
چکیده
Much of the research in inductive learning concentrates on problems with relatively small amounts of data residing at one location. In this paper we explore the scalability of learning arbiter and combiner trees from partitioned data. Arbiter and combiner trees integrate classiiers trained in parallel from small disjoint subsets. Previous work demonstrated their eecacy in terms of accuracy, this paper discusses their performance in terms of speedup and scalability. The performance of serial learning algorithms is evaluated. The performance of the algorithms used to construct combiner and arbiter trees in parallel is then analyzed. Our empirical results indicate that the techniques can eeectively scale up to large datasets with millions of records.
منابع مشابه
Scalability of Hierarchical Meta-learning on Partitioned Data
In this paper we study the issue of how to scale machine learning algorithms, that typically are designed to deal with main-memory based datasets, to eeciently learn models from large distributed databases. We have explored an approach called meta-learning that is related to the traditional approaches of data reduction commonly employed in distributed database query processing systems. We explo...
متن کاملLearning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning
Knowledge discovery in databases has become an increasingly important research topic with the advent of wide area network computing. One of the crucial problems we study in this paper is how to scale machine learning algorithms, that typically are designed to deal with main memory based datasets, to efficiently learn from large distributed databases. We have explored an approach called meta-lea...
متن کاملA Study of Meta-Learning in Ensemble Based Classifier
-The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is wellknown that ensemble methods can be used for improving prediction performance. Researchers from various disciplines such as statistics and AI considered the use of ensemble methodology. Meta-learning is a technique that seeks to compute higher-level classifiers (or classification models), c...
متن کاملدستهبندی دادههای دوردهای با ابرمستطیل موازی محورهای مختصات
One of the machine learning tasks is supervised learning. In supervised learning we infer a function from labeled training data. The goal of supervised learning algorithms is learning a good hypothesis that minimizes the sum of the errors. A wide range of supervised algorithms is available such as decision tress, SVM, and KNN methods. In this paper we focus on decision tree algorithms. When we ...
متن کامل{24 () Parallel Formulations of Decision-tree Classiication Algorithms
Classiication decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classiication decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classiication decision trees have a natural concurrency, but are diicult to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007